80 research outputs found

    RepSeq-A database of amino acid repeats present in lower eukaryotic pathogens

    Get PDF
    BACKGROUND Amino acid repeat-containing proteins have a broad range of functions and their identification is of relevance to many experimental biologists. In human-infective protozoan parasites (such as the Kinetoplastid and Plasmodium species), they are implicated in immune evasion and have been shown to influence virulence and pathogenicity. RepSeq http://repseq.gugbe.com is a new database of amino acid repeat-containing proteins found in lower eukaryotic pathogens. The RepSeq database is accessed via a web-based application which also provides links to related online tools and databases for further analyses. RESULTS The RepSeq algorithm typically identifies more than 98% of repeat-containing proteins and is capable of identifying both perfect and mismatch repeats. The proportion of proteins that contain repeat elements varies greatly between different families and even species (3 - 35% of the total protein content). The most common motif type is the Sequence Repeat Region (SRR) - a repeated motif containing multiple different amino acid types. Proteins containing Single Amino Acid Repeats (SAARs) and Di-Peptide Repeats (DPRs) typically account for 0.5 - 1.0% of the total protein number. Notable exceptions are P. falciparum and D. discoideum, in which 33.67% and 34.28% respectively of the predicted proteomes consist of repeat-containing proteins. These numbers are due to large insertions of low complexity single and multi-codon repeat regions. CONCLUSION The RepSeq database provides a repository for repeat-containing proteins found in parasitic protozoa. The database allows for both individual and cross-species proteome analyses and also allows users to upload sequences of interest for analysis by the RepSeq algorithm. Identification of repeat-containing proteins provides researchers with a defined subset of proteins which can be analysed by expression profiling and functional characterisation, thereby facilitating study of pathogenicity and virulence factors in the parasitic protozoa. While primarily designed for kinetoplastid work, the RepSeq algorithm and database retain full functionality when used to analyse other species

    Formation of regulatory modules by local sequence duplication

    Get PDF
    Turnover of regulatory sequence and function is an important part of molecular evolution. But what are the modes of sequence evolution leading to rapid formation and loss of regulatory sites? Here, we show that a large fraction of neighboring transcription factor binding sites in the fly genome have formed from a common sequence origin by local duplications. This mode of evolution is found to produce regulatory information: duplications can seed new sites in the neighborhood of existing sites. Duplicate seeds evolve subsequently by point mutations, often towards binding a different factor than their ancestral neighbor sites. These results are based on a statistical analysis of 346 cis-regulatory modules in the Drosophila melanogaster genome, and a comparison set of intergenic regulatory sequence in Saccharomyces cerevisiae. In fly regulatory modules, pairs of binding sites show significantly enhanced sequence similarity up to distances of about 50 bp. We analyze these data in terms of an evolutionary model with two distinct modes of site formation: (i) evolution from independent sequence origin and (ii) divergent evolution following duplication of a common ancestor sequence. Our results suggest that pervasive formation of binding sites by local sequence duplications distinguishes the complex regulatory architecture of higher eukaryotes from the simpler architecture of unicellular organisms

    Genome-wide survey and analysis of microsatellites in nematodes, with a focus on the plant-parasitic species Meloidogyne incognita

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Microsatellites are the most popular source of molecular markers for studying population genetic variation in eukaryotes. However, few data are currently available about their genomic distribution and abundance across the phylum Nematoda. The recent completion of the genomes of several nematode species, including <it>Meloidogyne incognita</it>, a major agricultural pest worldwide, now opens the way for a comparative survey and analysis of microsatellites in these organisms.</p> <p>Results</p> <p>Using MsatFinder, the total numbers of 1-6 bp perfect microsatellites detected in the complete genomes of five nematode species (<it>Brugia malayi</it>, <it>Caenorhabditis elegans</it>, <it>M. hapla</it>, <it>M. incognita</it>, <it>Pristionchus pacificus</it>) ranged from 2,842 to 61,547, and covered from 0.09 to 1.20% of the nematode genomes. Under our search criteria, the most common repeat motifs for each length class varied according to the different nematode species considered, with no obvious relation to the AT-richness of their genomes. Overall, (AT)<sub><it>n</it></sub>, (AG)<sub><it>n </it></sub>and (CT)<sub><it>n </it></sub>were the three most frequent dinucleotide microsatellite motifs found in the five genomes considered. Except for two motifs in <it>P. pacificus</it>, all the most frequent trinucleotide motifs were AT-rich, with (AAT)<sub><it>n </it></sub>and (ATT)<sub><it>n </it></sub>being the only common to the five nematode species. A particular attention was paid to the microsatellite content of the plant-parasitic species <it>M. incognita</it>. In this species, a repertoire of 4,880 microsatellite loci was identified, from which 2,183 appeared suitable to design markers for population genetic studies. Interestingly, 1,094 microsatellites were identified in 801 predicted protein-coding regions, 99% of them being trinucleotides. When compared against the InterPro domain database, 497 of these CDS were successfully annotated, and further assigned to Gene Ontology terms.</p> <p>Conclusions</p> <p>Contrasted patterns of microsatellite abundance and diversity were characterized in five nematode genomes, even in the case of two closely related <it>Meloidogyne </it>species. 2,245 di- to hexanucleotide loci were identified in the genome of <it>M. incognita</it>, providing adequate material for the future development of a wide range of microsatellite markers in this major plant parasite.</p

    XSTREAM: A practical algorithm for identification and architecture modeling of tandem repeats in protein sequences

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Biological sequence repeats arranged in tandem patterns are widespread in DNA and proteins. While many software tools have been designed to detect DNA tandem repeats (TRs), useful algorithms for identifying protein TRs with varied levels of degeneracy are still needed.</p> <p>Results</p> <p>To address limitations of current repeat identification methods, and to provide an efficient and flexible algorithm for the detection and analysis of TRs in protein sequences, we designed and implemented a new computational method called XSTREAM. Running time tests confirm the practicality of XSTREAM for analyses of multi-genome datasets. Each of the key capabilities of XSTREAM (e.g., merging, nesting, long-period detection, and TR architecture modeling) are demonstrated using anecdotal examples, and the utility of XSTREAM for identifying TR proteins was validated using data from a recently published paper.</p> <p>Conclusion</p> <p>We show that XSTREAM is a practical and valuable tool for TR detection in protein and nucleotide sequences at the multi-genome scale, and an effective tool for modeling TR domains with diverse architectures and varied levels of degeneracy. Because of these useful features, XSTREAM has significant potential for the discovery of naturally-evolved modular proteins with applications for engineering novel biostructural and biomimetic materials, and identifying new vaccine and diagnostic targets.</p

    Novel simple sequence repeats (SSRs) detected by ND-FISH in heterochromatin of Drosophila melanogaster

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>In recent years, substantial progress has been made in understanding the organization of sequences in heterochromatin regions containing single-copy genes and transposable elements. However, the sequence and organization of tandem repeat DNA sequences, which are by far the majority fraction of <it>D. melanogaster </it>heterochromatin, are little understood.</p> <p>Results</p> <p>This paper reports that the heterochromatin, as well as containing long tandem arrays of pentanucleotide satellites (AAGAG, AAGAC, AATAT, AATAC and AACAC), is also enriched in other simple sequence repeats (SSRs) such as A, AC, AG, AAG, ACT, GATA and GACA. Non-denaturing FISH (ND-FISH) showed these SSRs to localize to the chromocentre of polytene chromosomes, and was used to map them on mitotic chromosomes. Different distributions were detected ranging from single heterochromatic clusters to complex combinations on different chromosomes. ND-FISH performed on extended DNA fibres, along with Southern blotting, showed the complex organization of these heterochromatin sequences in long tracts, and revealed subclusters of SSRs (several kilobase in length) flanked by other DNA sequences. The chromosomal characterization of C, AAC, AGG, AAT, CCG, ACG, AGC, ATC and ACC provided further detailed information on the SSR content of <it>D. melanogaster </it>at the whole genome level.</p> <p>Conclusion</p> <p>These data clearly show the variation in the abundance of different SSR motifs and reveal their non-random distribution within and between chromosomes. The greater representation of certain SSRs in <it>D. melanogaster </it>heterochromatin suggests that its complexity may be greater than previously thought.</p

    Simple sequence repeats in Neurospora crassa: distribution, polymorphism and evolutionary inference

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Simple sequence repeats (SSRs) have been successfully used for various genetic and evolutionary studies in eukaryotic systems. The eukaryotic model organism <it>Neurospora crassa </it>is an excellent system to study evolution and biological function of SSRs.</p> <p>Results</p> <p>We identified and characterized 2749 SSRs of 963 SSR types in the genome of <it>N. crassa</it>. The distribution of tri-nucleotide (nt) SSRs, the most common SSRs in <it>N. crassa</it>, was significantly biased in exons. We further characterized the distribution of 19 abundant SSR types (AST), which account for 71% of total SSRs in the <it>N. crassa </it>genome, using a Poisson log-linear model. We also characterized the size variation of SSRs among natural accessions using Polymorphic Index Content (PIC) and ANOVA analyses and found that there are genome-wide, chromosome-dependent and local-specific variations. Using polymorphic SSRs, we have built linkage maps from three line-cross populations.</p> <p>Conclusion</p> <p>Taking our computational, statistical and experimental data together, we conclude that 1) the distributions of the SSRs in the sequenced N. crassa genome differ systematically between chromosomes as well as between SSR types, 2) the size variation of tri-nt SSRs in exons might be an important mechanism in generating functional variation of proteins in <it>N. crassa</it>, 3) there are different levels of evolutionary forces in variation of amino acid repeats, and 4) SSRs are stable molecular markers for genetic studies in <it>N. crassa</it>.</p

    Analyses of carnivore microsatellites and their intimate association with tRNA-derived SINEs

    Get PDF
    BACKGROUND: The popularity of microsatellites has greatly increased in the last decade on account of their many applications. However, little is currently understood about the factors that influence their genesis and distribution among and within species genomes. In this work, we analyzed carnivore microsatellite clones from GenBank to study their association with interspersed repeats and elucidate the role of the latter in microsatellite genesis and distribution. RESULTS: We constructed a comprehensive carnivore microsatellite database comprising 1236 clones from GenBank. Thirty-three species of 11 out of 12 carnivore families were represented, although two distantly related species, the domestic dog and cat, were clearly overrepresented. Of these clones, 330 contained tRNA(Lys)-derived SINEs and 357 contained other interspersed repeats. Our rough estimates of tRNA SINE copies per haploid genome were much higher than published ones. Our results also revealed a distinct juxtaposition of AG and A-rich repeats and tRNA(Lys)-derived SINEs suggesting their coevolution. Both microsatellites arose repeatedly in two regions of the insterspersed repeat. Moreover, microsatellites associated with tRNA(Lys)-derived SINEs showed the highest complexity and less potential instability. CONCLUSION: Our results suggest that tRNA(Lys)-derived SINEs are a significant source for microsatellite generation in carnivores, especially for AG and A-rich repeat motifs. These observations indicate two modes of microsatellite generation: the expansion and variation of pre-existing tandem repeats and the conversion of sequences with high cryptic simplicity into a repeat array; mechanisms which are not specific to tRNA(Lys)-derived SINEs. Microsatellite and interspersed repeat coevolution could also explain different distribution of repeat types among and within species genomes. Finally, due to their higher complexity and lower potential informative content of microsatellites associated with tRNA(Lys)-derived SINEs, we recommend avoiding their use as genetic markers

    Non-Image-Forming Light Driven Functions Are Preserved in a Mouse Model of Autosomal Dominant Optic Atrophy

    Get PDF
    Autosomal dominant optic atrophy (ADOA) is a slowly progressive optic neuropathy that has been associated with mutations of the OPA1 gene. In patients, the disease primarily affects the retinal ganglion cells (RGCs) and causes optic nerve atrophy and visual loss. A subset of RGCs are intrinsically photosensitive, express the photopigment melanopsin and drive non-image-forming (NIF) visual functions including light driven circadian and sleep behaviours and the pupil light reflex. Given the RGC pathology in ADOA, disruption of NIF functions might be predicted. Interestingly in ADOA patients the pupil light reflex was preserved, although NIF behavioural outputs were not examined. The B6; C3-Opa1Q285STOP mouse model of ADOA displays optic nerve abnormalities, RGC dendropathy and functional visual disruption. We performed a comprehensive assessment of light driven NIF functions in this mouse model using wheel running activity monitoring, videotracking and pupillometry. Opa1 mutant mice entrained their activity rhythm to the external light/dark cycle, suppressed their activity in response to acute light exposure at night, generated circadian phase shift responses to 480 nm and 525 nm pulses, demonstrated immobility-defined sleep induction following exposure to a brief light pulse at night and exhibited an intensity dependent pupil light reflex. There were no significant differences in any parameter tested relative to wildtype littermate controls. Furthermore, there was no significant difference in the number of melanopsin-expressing RGCs, cell morphology or melanopsin transcript levels between genotypes. Taken together, these findings suggest the preservation of NIF functions in Opa1 mutants. The results provide support to growing evidence that the melanopsin-expressing RGCs are protected in mitochondrial optic neuropathies
    corecore